I/O Complexity for Range Queries on Region Data Stored Using an R-tree
نویسندگان
چکیده
In this paper we study the node distribution of an Rtree storing region data, like for instance islands, lakes or human-inhabited areas. We will show that real region datasets are packed in minimum bounding rectangles (MBRs) whose area distribution follows the same power law, named REGAL (REGion Area Law) [12], as that for the regions themselves. Moreover, these MBRs are packed in their turn into MBRs following the same law, and so on iteratively, up to the root of the R-tree. Based on this observation, we are able to accurately estimate the search effort for range queries, the most prominent spatial operation, using a small number of easy-to-retrieve parameters. Experiments on a variety of real datasets (islands, lakes, human-inhabited areas) show that our estimation is accurate, enjoying a maximum geometric average relative error within 30%.
منابع مشابه
ارائه روشی پویا جهت پاسخ به پرسوجوهای پیوسته تجمّعی اقتضایی
Data Streams are infinite, fast, time-stamp data elements which are received explosively. Generally, these elements need to be processed in an online, real-time way. So, algorithms to process data streams and answer queries on these streams are mostly one-pass. The execution of such algorithms has some challenges such as memory limitation, scheduling, and accuracy of answers. They will be more ...
متن کاملSeparating indexes from data: a distributed scheme for secure database outsourcing
Database outsourcing is an idea to eliminate the burden of database management from organizations. Since data is a critical asset of organizations, preserving its privacy from outside adversary and untrusted server should be warranted. In this paper, we present a distributed scheme based on storing shares of data on different servers and separating indexes from data on a distinct server. Shamir...
متن کاملAn External - Memory Data Structure for Shortest Path
In this paper, we present results related to satisfying shortest path queries on a planar graph stored in external memory. N denotes the total number of vertices and edges in the graph and sort(N) denotes the number of input/output (I/O) operations required to sort an array of length N. 1) We describe a data structure for supporting bottom-up traversal of rooted trees in external memory. A tree...
متن کاملEfficient SQL-Querying Method for Data Mining in Large Data Bases
Data mining can be understood as a process of extraction of knowledge hidden in very large data sets. Often data mining techniques (e.g. discretization or decision tree) are based on searching for an optimal part i t ion of data wi th respect to some optimization criterion. In this paper, we investigate the problem of optimal binary part i t ion of continuous attr ibute domain for large data se...
متن کاملSkyline Queries in O(1) time?
The skyline of a set P of points (SKY (P )) consists of the "best" points with respect to minimization or maximization of the attribute values. A point p dominates another point q if p is as good as q in all dimensions and it is strictly better than q in at least one dimension. In this work, we focus on the static 2-d space and provide expected performance guarantees for 3-sided Range Skyline Q...
متن کامل